云监控接入本地Prometheus

169次阅读

没有评论

共计 6414 个字符，预计需要花费 17 分钟才能阅读完成。

背景

混合云场景下，往往不希望让监控成为数据孤岛，提高管理和数据消费的复杂度。

试想一下，假如多云几十个账号查看监控数据需要逐一使用账号登录云厂控制台，是一件多么痛苦的事情。

我们已经有足够成熟的监控体系，所需的无非是将数据收集到自建体系上进行统一存储。

‍

有3种方式可以进行采集：

公有云云监控API
公有云云监控 exporter
第三方 exporter

一般情况下，推荐使用 云监控 exporter 省时省力。

‍

关键术语

主账号：公有云主体账号
IAM user：子账号。分配API访问凭证和授予相应权限后，通过凭证调用API。
Region：地域。云厂的资源以地域作为大粒度单元隔离。不同地域往往有不同的endpoint。
namespace：监控指标的命名空间，即公有云封装好的不同SAAS服务，例如云服务器、负载均衡器等。
metrics：具体监控指标

‍

云厂往往以上述从上到下5个层级，最终定位到资源的具体监控指标。

且对于同一云厂，不同账号的不同region，需要启动多个exporter才能采集。

单一exporter对应的是单一账号和单一region。

‍

实践

AWS

AWS 官方并未提供 exporter 用于采集。这里介绍 prometheus 官方提供的 exporter。使用 docker 运行。

参考链接：Github cloudwatch_exporter

按 API 调用次数收费。
pull方式，由 prometheus 主动抓取 exporter metrics。接口为 ip:port/metrics

‍

run脚本范例：

docker run -d --restart=always \
--name ***-us-west-2-cloudwatch \
-e AWS_ACCESS_KEY_ID=*** \
-e AWS_SECRET_ACCESS_KEY=***  \
-p 9106:9106 \
-v /etc/cloudwatch-exporter/***-us-west-2-config.yml:/config/config.yml \
prom/cloudwatch-exporter

‍

配置文件范例：

# cat /etc/cloudwatch-exporter/***-us-west-2-config.yml
---
region: "us-west-2"
metrics:
- aws_dimensions:
  - BucketName
  - FilterId
  aws_metric_name: AllRequests
  aws_namespace: AWS/S3
  aws_statistics:
  - Sum
- aws_dimensions:
  - BucketName
  - FilterId
  aws_metric_name: 5xxErrors
  aws_namespace: AWS/S3
  aws_statistics:
  - Sum
- aws_dimensions:
  - BucketName
  - FilterId
  aws_metric_name: 4xxErrors
  aws_namespace: AWS/S3
  aws_statistics:
  - Sum
- aws_dimensions:
  - BucketName
  - FilterId
  aws_metric_name: TotalRequestLatency
  aws_namespace: AWS/S3
  aws_statistics:
  - Average
- aws_dimensions:
  - BucketName
  - FilterId
  aws_metric_name: PutRequests
  aws_namespace: AWS/S3
  aws_statistics:
  - Sum
- aws_dimensions:
  - BucketName
  - FilterId
  aws_metric_name: ListRequests
  aws_namespace: AWS/S3
  aws_statistics:
  - Sum
- aws_dimensions:
  - BucketName
  - FilterId
  aws_metric_name: HeadRequests
  aws_namespace: AWS/S3
  aws_statistics:
  - Sum
- aws_dimensions:
  - BucketName
  - FilterId
  aws_metric_name: GetRequests
  aws_namespace: AWS/S3
  aws_statistics:
  - Sum

- aws_namespace: AWS/EC2
  aws_metric_name: StatusCheckFailed_System
  aws_dimensions: [InstanceId]
  aws_statistics: [Average]
  set_timestamp: false

- aws_namespace: AWS/ELB
  aws_metric_name: HTTPCode_ELB_4XX
  aws_dimensions: [LoadBalancerName]
  aws_statistics: [Sum]
  set_timestamp: false

- aws_namespace: AWS/ELB
  aws_metric_name: HTTPCode_ELB_5XX
  aws_dimensions: [LoadBalancerName]
  aws_statistics: [Sum]
  set_timestamp: false

- aws_namespace: AWS/ELB
  aws_metric_name: HTTPCode_Backend_4XX
  aws_dimensions: [LoadBalancerName]
  aws_statistics: [Sum]
  set_timestamp: false

‍

当然也还有其他exporter，如果你的监控层使用了 InfluxData 系的 telegraf，亦支持直接采集 cloudwatch 指标。这里不过多赘述。

参考链接：Github telegraf cloudwach

‍

华为云

华为云官方提供了 cloudeye-exporter 用于采集监控数据。

参考链接：Github huaweicloud cloudeye-exporter

不收费
有流控限制。默认限制每分钟调用次数 < 1000。需联系商务后台加大 limit。
pull方式，由 prometheus 主动抓取 exporter metrics。接口为 ip:port/metrics

‍

配置文件范例：

global:
   prefix: "huaweicloud"
   port: ":8087"
   metric_path: "/metrics"
   resource_sync_interval_minutes: 20
   scrape_batch_size: 300

auth:
  auth_url: "https://iam.cn-north-4.myhuaweicloud.com/v3"
  project_name: "cn-north-4"
  access_key: "***"
  secret_key: "***"
  region: "cn-north-4"

‍

华为云的较为特殊，namespace 是在 prometheus 中，传入 params 进行配置，而不是在其本身的配置文件中。

prometheus job范例：

- job_name: "hw-cloudeye-elb"
  scrape_interval: 60s
  scrape_timeout: 30s
  static_configs:
    - labels:
        account: "***"
        region: "cn-north-4"
        region_name: "北京4"
        cloud_provider: "hwcloud"
      targets:
        - 127.0.0.1:8087
  params:
    services: ['SYS.ELB']

‍

systemd 单元文件范例：

# cat /etc/systemd/system/cloudeye_exporter_cn-north-4.service 
[Unit]
Description=Huawei Cloud Eye Exporter
After=network.target

[Service]
Type=simple
User=root
Group=root
Nice=-5
ExecStart=/usr/local/cloudeye-exporter/cloudeye-exporter \
     -config=/usr/local/cloudeye-exporter/clouds-cn-north-4.yml

SyslogIdentifier=cloudeye_exporter
Restart=always

[Install]
WantedBy=multi-user.target

‍

阿里云

阿里云官方提供了 exporter 用于导出监控数据。

参考链接：阿里云 docs

按 API 调用次数收费。
push方式，由 exporter 调用 prometheus remote write API 远程写入。
IAM侧添加：AliyunCloudMonitorReadOnlyAccess 权限。

‍

配置文件范例：

# 服务端接入点，配置文件位置、日志等级等
serverconf:
  service_endpoint: metrics.cn-beijing.aliyuncs.com # 接入点地址
  port: 9123                                         # 指定端口号
  page_size: 300                                     # 查询数据的分页大小
  log_dest: 1                                        # 1 标准输出; 2 文件
  log_dir:                                           # 日志文件位置
  log_level: Info                                    # 日志等级
  http_proxy:                                        # http代理
  https_proxy:                                       # https代理
  no_proxy:                                          # noproxy代理
  no_meta: true                                      # 是否补充meta信息的功能
  no_savepoint: true                                 # 是否记录进度
  no_tag_prefix: true                                # 是否为标签增加'tag_'前缀

# 远端 Prometheus 的写入地址和鉴权
remote_prom:
  endpoint: http://127.0.0.1:9090/api/v1/write  # 客户的 Prometheus 实例服务地址
  basic_auth:

# 用户账号的信息，tag，desc等信息的查询
credential:
  user_id: 123123123   # 用户的主账号，只能填写主账号id
  access_key: *****    # 用户的 AK
  access_secret: ***** # 用户的 SK

# 配置用户想要写入数据的label，如果不需要可以直接删除这一项配置
datatag:
  - {key: cloud_provider, val: aliyun}
  - {key: region, val: cn-beijing}

# 配置需要导出的产品类型和指标
products:
  - namespace: acs_alb  # 指定产品
    period: 60                    # 为该产品的指标指定一个总的Period
    metric_info:
    - metric_list: [ListenerQPS]    # 配置需要导出的一组指标
      period: 60                  # 指定这一组指标的Period

‍

systemd 单元文件范例：

由于 exporter_local 不支持显示指定配置文件路径，需在 systemd 中配置 WorkingDirectory 指定工作目录。

# cat /etc/systemd/system/aliyun_exporter.service 
[Unit]
Description=Aliyun Cloud Monitor Exporter
After=network.target

[Service]
Type=simple
User=root
Group=root
Nice=-5
WorkingDirectory=/usr/local/aliyun_cloud_monitor
ExecStart=/usr/local/aliyun_cloud_monitor/exporter_local

SyslogIdentifier=aliyun_exporter
Restart=always

[Install]
WantedBy=multi-user.target

‍

火山引擎

火山引擎官方提供了 exporter 用于导出监控数据。

参考链接：火山引擎 docs

公测阶段暂不收费。
有流控。
指标namespace，需要提前联系商务开白名单。否则抓取会报错。
pull方式，由 prometheus 主动抓取 exporter metrics。接口为 ip:port/metrics

‍

配置文件范例：

Region: "cn-beijing"
Credentials:
  AccessKey: "*****"
  SecretKey: "*****"
ExtraLabels: ["Name"]
Products:
  - Namespace: "VCM_ALB"

‍

systemd 单元文件范例：

# cat hsyun_exporter.service 
[Unit]
Description=Hsyun Cloud Monitor Exporter
After=network.target

[Service]
Type=simple
User=root
Group=root
Nice=-5
ExecStart=/usr/local/volcengine_cloud_monitor/exporter \
  --config /usr/local/volcengine_cloud_monitor/config.yaml \
  --port=9898

SyslogIdentifier=volcengine_exporter
Restart=always

[Install]
WantedBy=multi-user.target

‍

金山云

金山云官方提供了 exporter 用于导出监控数据。

参考链接：金山云 docs

免费
有流控。
pull方式，由 prometheus 主动抓取 exporter metrics。接口为 ip:port/metrics。
IAM侧添加：MonitorReadOnlyAccess 权限；同时对于部分服务需要能够 get 资源列表，需单独赋权，例如SLBReadOnlyAccess

‍

配置文件范例：

rate_limit: 15                                 

credential:
  access_key: ****
  secret_key: ****
  region: cn-beijing-6

product_conf:
  - namespace: SLB
    only_include_metrics:
      - slb.req_rate
    reload_interval_minutes: 60

‍

systemd 单元文件范例：

# cat /etc/systemd/system/ksyun_exporter.service
[Unit]
Description=Ksyun Cloud Monitor Exporter
After=network.target

[Service]
Type=simple
User=root
Group=root
Nice=-5
ExecStart=/usr/local/ksyun_cloud_monitor/ksc_exporter \
--config.file /usr/local/ksyun_cloud_monitor/config.yaml \
--web.listen-address=":9120"

SyslogIdentifier=ksyun_exporter
Restart=always

[Install]
WantedBy=multi-user.target

‍

结论

本质上，所有方式都是通过调用公有云API 抓取指标写入本地。只是部分公有云封装好了产品，让调用更方便。

上述方案解决了存储问题，但是对于消费场景，仍有待思考的点。

对于同类型服务指标，例如 LB 七层QPS，5个云就会有5个不同的 metric。从全局视角来衡量，多 metrics 增加了 PromQL 编写的复杂度。

可以考虑开发统一插件，将不同公有云的相同 metric 打平后，再落地到存储。

‍

本文属于专题：Prometheus Exporter

引用链接

正文完

发表至：公有云监控

2024-10-10

转载说明：除特殊说明外本站文章皆由CC-4.0协议发布，转载请注明出处：https://www.opshub.cn

Prometheus 替代方案：VictoriaMetrics